Recognition of Phonemes in A-cappella Recordings using Temporal Patterns and Mel Frequency Cepstral Coefficients
نویسنده
چکیده
In this paper, a new method for recognizing phonemes in singing is proposed. Recognizing phonemes in singing is a task that has not yet matured to a standardized method, in comparison to regular speech recognition. The standard methods for regular speech recognition have already been evaluated on vocal records, but their performances are lower compared to regular speech. In this paper, two alternative classification methods dealing with this issue are proposed. One uses Mel-Frequency Cepstral Coefficient features, while another uses Temporal Patterns. They are combined to create a new type of classifier which produces a better performance than the two separate classifiers. The classifications are done with US English songs. The preliminary result is a phoneme recall rate of 48.01% in average of all audio frames within a song.
منابع مشابه
Pronunciation recognition of English phonemes /\textipa{@}/, /æ/, /\textipa{A}: / and /\textipa{2}/ using Formants and Mel Frequency Cepstral Coefficients
The Vocal Joystick Vowel Corpus, by Washington University, was used to study monophthongs pronounced by native English speakers. The objective of this study was to quantitatively measure the extent at which speech recognition methods can distinguish between similar sounding vowels. In particular, the phonemes /@/, /æ/, /A:/ and /2/ were analysed. 748 sound files from the corpus were used and su...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملMeasuring Acoustic Reduction in Feature Space
Modelling varying speaking style remains a challenge to state of the art speech recognition and synthesis systems. Vowel and consonant reduction have been identified as correlative to speaking style variation, but still lack a common measurement. The reduction phenomena are often observed without consideration of coarticulation and assimilation effects, and as a result of speaking rate variabil...
متن کاملA Comparative Study Of LPCC And MFCC Features For The Recognition Of Assamese Phonemes
In this paper two popular feature extraction techniques Linear Predictive Cepstral Coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC) have been investigated and their performances have been evaluated for the recognition of Assamese phonemes. A multilayer perceptron based baseline phoneme recognizer has been built and all the experiments have been carried out using that recognize...
متن کاملThe Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition
Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature ex...
متن کامل